Survey: Transformer based video-language pre-training

نویسندگان

چکیده

Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have started to apply transformer video processing. This survey aims provide a comprehensive overview for Video-Language learning. We first briefly introduce structure as background knowledge, including attention mechanism, position encoding etc. then describe typical paradigm & fine-tuning processing in terms proxy downstream commonly used datasets. Next, we categorize models into Single-Stream Multi-Stream structures, highlight their innovations compare performances. Finally, analyze discuss current challenges possible future research directions pre-training.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

the effect of lexically based language teaching (lblt) on vocabulary learning among iranian pre-university students

هدف پژوهش حاضر بررسی تاثیر روش تدریس واژگانی (واژه-محور) بر یادگیری لغات در بین دانش آموزان دوره پیش دانشگاهی است. بدین منظور دو گروه از دانش آموزان دوره پیش دانشگاهی (شصت نفر) که در سال تحصیلی 1389 در شهرستان نور آباد استان لرستان مشغول به تحصیل بودند انتخاب شده و به صورت قراردادی گروه آزمایش و گواه در نظر گرفته شدند. در ابتدا به منظور اطمینان یافتن از میزان همگن بودن دو گروه از دانش واژگان، آ...

15 صفحه اول

Content-Based Pre-Indexed Video

The viability of large distributed image databases is strongly dependent on the development of new image representations capable of providing support for extended functional-ity, directly in the compressed domain. We have recently introduced one such representation (Library-based coding) which we now augment with statistical pre-indexing schemes, automatically built at the time of encoding, tha...

متن کامل

Video survey of pre-grasp interactions in natural hand activities

Objects are often movable in the environment and do not have to be grasped from the presented placement. •Pre-grasp interaction can adjust object configuration in the environment to improve the task conditions for the final grasp. •Our video observation surveys the variety of pre-grasp interactions used by people in natural task settings. •The observed pre-grasp interactions can be described by...

متن کامل

Language Generation with Recurrent Generative Adversarial Networks without Pre-training

Generative Adversarial Networks (GANs) have shown great promise recently in image generation. Training GANs for text generation has proven to be more difficult, because of the non-differentiable nature of generating text with recurrent neural networks. Consequently, past work has either resorted to pre-training with maximumlikelihood or used convolutional networks for generation. In this work, ...

متن کامل

Feed forward pre-training for recurrent neural network language models

The recurrent neural network language model (RNNLM) has been demonstrated to consistently reduce perplexities and automatic speech recognition (ASR) word error rates across a variety of domains. In this paper we propose a pre-training method for the RNNLM, by sharing the output weights of the feed forward neural network language model (NNLM) with the RNNLM. This is accomplished by first fine-tu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: AI open

سال: 2022

ISSN: ['2666-6510']

DOI: https://doi.org/10.1016/j.aiopen.2022.01.001